Decomposition Methods for Solving Finite-Horizon Large MDPs
نویسندگان
چکیده
Conventional algorithms for solving Markov decision processes (MDPs) become intractable a large finite state and action spaces. Several studies have been devoted to this issue, but most of them only treat infinite-horizon MDPs. This paper is one the first works deal with non-stationary finite-horizon MDPs by proposing new decomposition approach, which consists in partitioning problem into smaller restricted MDPs, each MDP solved independently, specific order, using proposed hierarchical backward induction (HBI) algorithm based on (BI) algorithm. Next, sub-local solutions are combined obtain global solution. An example racetrack problems shows performance proposal technique.
منابع مشابه
Lazy Approximation for Solving Continuous Finite-Horizon MDPs
Solving Markov decision processes (MDPs) with continuous state spaces is a challenge due to, among other problems, the well-known curse of dimensionality. Nevertheless, numerous real-world applications such as transportation planning and telescope observation scheduling exhibit a critical dependence on continuous states. Current approaches to continuous-state MDPs include discretizing their tra...
متن کاملLazy Approximation: A New Approach for Solving Continuous Finite-Horizon MDPs
Solving Markov decision processes (MDPs) with continuous state spaces is a challenge due to, among other problems, the well-known curse of dimensionality. Nevertheless, numerous real-world applications such as transportation planning and telescope observation scheduling exhibit a critical dependence on continuous states. Current approaches to continuous-state MDPs include discretizing their tra...
متن کاملTrial-Based Heuristic Tree Search for Finite Horizon MDPs
Dynamic programming is a well-known approach for solving MDPs. In large state spaces, asynchronous versions like Real-Time Dynamic Programming (RTDP) have been applied successfully. If unfolded into equivalent trees, Monte-Carlo Tree Search algorithms are a valid alternative. UCT, the most popular representative, obtains good anytime behavior by guiding the search towards promising areas of the...
متن کاملDivide-and-Conquer Methods for Solving MDPs
The Markov Decision Process (MDP) is the principal theoretical formalism in the area of Reinforcement Learning (RL). An import from optimal control in operations research, this construct is generic enough to represent problems comprising almost all of AI research, but consequently, it suffers from the curse of dimensionality where learning involves an exponential number of parameters. Researche...
متن کاملAnalysis of methods for solving MDPs
New proofs for two extensions to value iteration are derived when the type of initialisation of the value function is considered. Theoretical requirements that guarantee the convergence of backward value iteration and weaker requirements for the convergence of backups based on best actions only are identified. Experimental results show that standard value iteration performs significantly faster...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Mathematics
سال: 2022
ISSN: ['2314-4785', '2314-4629']
DOI: https://doi.org/10.1155/2022/8404716